500 research outputs found

    Sentiment Classification of Russian Texts Using Automatically Generated Thesaurus

    Get PDF
    This paper is devoted to an approach for sentiment classification of Russian texts applying an automatic thesaurus of the subject area. This approach consists of a standard machine learning classifier and a procedure embedded into it, that uses the- saurus relationships for better sentiment analysis. The thesaurus is generated fully automatically and does not require expert’s involvement into classification process. Experiments conducted with the approach and four Russian-language text corpora, show effectiveness of thesaurus application to sentiment classification

    Sentiment Classification into Three Classes Applying Multinomial Bayes Algorithm, N-grams, and Thesaurus

    Get PDF
    The paper is devoted to development of the method that classi?es texts in English and Russian by sentiments into positive, negative, and neutral. The proposed method is based on the Multinomial Naive Bayes classi?er with additional n-grams application. The classi?er is trained either on three classes, or on two contrasting classes with a threshold to separate neutral texts. Experiments with texts on various topics showed signi?cant improvement of classification quality for reviews from a particular domain. Besides, the analysis of thesaurus relationships application to sentiment classification into three classes was done, however it did not show significant improvement of the classification results

    Классификация статей из средств массовой информации по категориям и релевантности предметной области

    Get PDF
    The research is devoted to classification of news articles about P. G. Demidov Yaroslavl State University (YarSU) into 4 categories: “society”, “education”, “science and technologies”, “not relevant”.The proposed approaches are based on using the BERT neural network and methods of machine learning: SVM, Logistic Regression, K-Neighbors, Random Forest, in combination of different embedding types: Word2Vec, FastText, TF-IDF, GPT-3. Also approaches of text preprocessing are considered to achieve higher quality of the classification. The experiments showed that the SVM classifier with TF-IDF embedding and trained on full article texts with titles achieved the best result. Its micro-F-measure and macro-F-measure are 0.8214 and 0.8308 respectively. The BERT neural network trained on fragments of paragraphs with YarSU mentions, from which the first 128 words and the last 384 words were taken, showed comparable results. The resulting micro-F-measure and macro-F-measure are 0.8304 and 0.8181 respectively. Thus, using paragraphs with the target organisation mentions is enough to classify text by categories efficiently.Исследование посвященно классификации новостных статей о Ярославском государственном университете им. П. Г. Демидова (ЯрГУ) на 4 категории: общество, образование, наука и технологии, нерелевантная.Предложенные подходы основаны на нейронной сети BERT и методах машинного обучения SVM, Logistic Regression, K-Neighbors, Random Forest в сочетании с эмбеддингами различных видов: Word2Vec, FastText, TF-IDF, GPT-3. Также предложены способы предобработки текстов для достижения более высокого качества классификации. В ходе экспериментов установлено, что лучше всего с задачей справляется SVM-классификатор с эмбеддингом TF-IDF, обученный на полных текстах статей с заголовками. Его значения микро- и макро-F-меры достигают 0.8214 и 0.8308 соответственно. Сопоставимые результаты показывает нейронная сеть BERT, обученная на фрагментах абзацев с упоминанием ЯрГУ, из которых брались 128 слов из начала и 384 слова из конца. Её показатели микро- и макро-F-меры достигают 0.8304 и 0.8181 соответственно. Таким образом, установлено, что абзацев с упоминанием конкретной организации оказывается достаточно, чтобы классификация по категориям была эффективной

    Multiplicity dependence of light (anti-)nuclei production in p–Pb collisions at sNN=5.02 TeV

    Get PDF
    The measurement of the deuteron and anti-deuteron production in the rapidity range −1 < y < 0 as a function of transverse momentum and event multiplicity in p–Pb collisions at √sNN = 5.02 TeV is presented. (Anti-)deuterons are identified via their specific energy loss dE/dx and via their time-of- flight. Their production in p–Pb collisions is compared to pp and Pb–Pb collisions and is discussed within the context of thermal and coalescence models. The ratio of integrated yields of deuterons to protons (d/p) shows a significant increase as a function of the charged-particle multiplicity of the event starting from values similar to those observed in pp collisions at low multiplicities and approaching those observed in Pb–Pb collisions at high multiplicities. The mean transverse particle momenta are extracted from the deuteron spectra and the values are similar to those obtained for p and particles. Thus, deuteron spectra do not follow mass ordering. This behaviour is in contrast to the trend observed for non-composite particles in p–Pb collisions. In addition, the production of the rare 3He and 3He nuclei has been studied. The spectrum corresponding to all non-single diffractive p-Pb collisions is obtained in the rapidity window −1 < y < 0 and the pT-integrated yield dN/dy is extracted. It is found that the yields of protons, deuterons, and 3He, normalised by the spin degeneracy factor, follow an exponential decrease with mass number

    Observation of medium-induced yield enhancement and acoplanarity broadening of low-pTp_\mathrm{T} jets from measurements in pp and central Pb-Pb collisions at sNN=5.02\sqrt{s_{\rm NN}}=5.02 TeV

    No full text
    International audienceThe ALICE Collaboration reports the measurement of semi-inclusive distributions of charged-particle jets recoiling from a high transverse momentum (high pTp_{\rm T}) hadron trigger in proton-proton and central Pb-Pb collisions at sNN=5.02\sqrt{s_{\rm NN}} = 5.02 TeV. A data-driven statistical method is used to mitigate the large uncorrelated background in central Pb-Pb collisions. Recoil jet distributions are reported for jet resolution parameter R=0.2R=0.2, 0.4, and 0.5 in the range 7<pT,jet<1407 < p_{\rm T,jet} < 140 GeV/c/c and trigger-recoil jet azimuthal separation π/2<Δφ<π\pi/2 < \Delta\varphi < \pi. The measurements exhibit a marked medium-induced jet yield enhancement at low pTp_{\rm T} and at large azimuthal deviation from Δφπ\Delta\varphi\sim\pi. The enhancement is characterized by its dependence on Δφ\Delta\varphi, which has a slope that differs from zero by 4.7σ\sigma. Comparisons to model calculations incorporating different formulations of jet quenching are reported. These comparisons indicate that the observed yield enhancement arises from the response of the QGP medium to jet propagation

    Probing the Chiral Magnetic Wave with charge-dependent flow measurements in Pb-Pb collisions at the LHC

    No full text
    International audienceThe Chiral Magnetic Wave (CMW) phenomenon is essential to provide insights into the strong interaction in QCD, the properties of the quark-gluon plasma, and the topological characteristics of the early universe, offering a deeper understanding of fundamental physics in high-energy collisions. Measurements of the charge-dependent anisotropic flow coefficients are studied in Pb-Pb collisions at center-of-mass energy per nucleon-nucleon collision sNN=\sqrt{s_{\mathrm{NN}}}= 5.02 TeV to probe the CMW. In particular, the slope of the normalized difference in elliptic (v2v_{2}) and triangular (v3v_{3}) flow coefficients of positively and negatively charged particles as a function of their event-wise normalized number difference, is reported for inclusive and identified particles. The slope r3Normr_{3}^{\rm Norm} is found to be larger than zero and to have a magnitude similar to r2Normr_{2}^{\rm Norm}, thus pointing to a large background contribution for these measurements. Furthermore, r2Normr_{2}^{\rm Norm} can be described by a blast wave model calculation that incorporates local charge conservation. In addition, using the event shape engineering technique yields a fraction of CMW (fCMWf_{\rm CMW}) contribution to this measurement which is compatible with zero. This measurement provides the very first upper limit for fCMWf_{\rm CMW}, and in the 10-60% centrality interval it is found to be 26% (38%) at 95% (99.7%) confidence level

    Prompt and non-prompt J/ψ/\psi production at midrapidity in Pb-Pb collisions at sNN\sqrt{s_{\mathrm{NN}}} = 5.02 TeV

    No full text
    International audienceThe transverse momentum (pTp_{\rm T}) and centrality dependence of the nuclear modification factor RAAR_{\rm AA} of prompt and non-prompt J/ψ/\psi, the latter originating from the weak decays of beauty hadrons, have been measured by the ALICE collaboration in Pb-Pb collisions at sNN\sqrt{s_{\mathrm{NN}}} = 5.02 TeV. The measurements are carried out through the e+e{\rm e}^{+}{\rm e}^{-} decay channel at midrapidity (y|y| 5 GeV/cc, which becomes stronger with increasing collision centrality. The results are consistent with similar LHC measurements in the overlapping pTp_{\rm T} intervals, and cover the kinematic region down to pTp_{\rm T} = 1.5 GeV/cc at midrapidity, not accessible by other LHC experiments. The suppression of prompt J/ψ/\psi in central and semicentral collisions exhibits a decreasing trend towards lower transverse momentum, described within uncertainties by models implementing J/ψ/\psi production from recombination of c and c\overline{\rm c} quarks produced independently in different partonic scatterings. At high transverse momentum, transport models including quarkonium dissociation are able to describe the suppression for prompt J/ψ/\psi. For non-prompt J/ψ/\psi, the suppression predicted by models including both collisional and radiative processes for the computation of the beauty-quark energy loss inside the quark-gluon plasma is consistent with measurements within uncertainties

    Charged-particle production as a function of the relative transverse activity classifier in pp, p-Pb, and Pb-Pb collisions at the LHC

    No full text
    International audienceMeasurements of charged-particle production in pp, p-Pb, and Pb-Pb collisions in the toward, away, and transverse regions with the ALICE detector are discussed. These regions are defined event-by-event relative to the azimuthal direction of the charged trigger particle, which is the reconstructed particle with the largest transverse momentum (pTtrigp_{\mathrm{T}}^{\rm trig}) in the range 8<pTtrig<158<p_{\mathrm{T}}^{\rm trig}<15 GeV/c/c. The toward and away regions contain the primary and recoil jets, respectively; both regions are accompanied by the underlying event (UE). In contrast, the transverse region perpendicular to the direction of the trigger particle is dominated by the so-called UE dynamics, and includes also contributions from initial- and final-state radiation. The relative transverse activity classifier, RT=NchT/NchTR_{\mathrm{T}}=N_{\mathrm{ch}}^{\mathrm{T}}/\langle N_{\mathrm{ch}}^{\mathrm{T}}\rangle, is used to group events according to their UE activity, where NchTN_{\mathrm{ch}}^{\mathrm{T}} is the charged-particle multiplicity per event in the transverse region and NchT\langle N_{\mathrm{ch}}^{\mathrm{T}}\rangle is the mean value over the whole analysed sample. The energy dependence of the RTR_{\mathrm{T}} distributions in pp collisions at s=2.76\sqrt{s}=2.76, 5.02, 7, and 13 TeV is reported, exploring the Koba-Nielsen-Olesen (KNO) scaling properties of the multiplicity distributions. The first measurements of charged-particle pTp_{\rm T} spectra as a function of RTR_{\mathrm{T}} in the three azimuthal regions in pp, p-Pb, and Pb-Pb collisions at sNN=5.02\sqrt{s_{\rm NN}}=5.02 TeV are also reported. Data are compared with predictions obtained from the event generators PYTHIA 8 and EPOS LHC. This set of measurements is expected to contribute to the understanding of the origin of collective-like effects in small collision systems (pp and p-Pb)

    Measurement of (anti)alpha production in central Pb-Pb collisions at sNN\sqrt{s_{\rm NN}} = 5.02 TeV

    No full text
    International audienceIn this letter, measurements of (anti)alpha production in central (0-10%) Pb-Pb collisions at a center-of-mass energy per nucleon-nucleon pair of sNN\sqrt{s_{\rm NN}} = 5.02 TeV are presented, including the first measurement of an antialpha transverse-momentum spectrum. Owing to its large mass, (anti)alpha production yields and transverse-momentum spectra are of particular interest because they provide a stringent test of particle production models. The averaged antialpha and alpha spectrum is included into a common blast-wave fit with lighter particles, indicating that the (anti)alpha also participates in the collective expansion of the medium created in the collision. A blast-wave fit including only protons, (anti)alpha, and other light nuclei results in a similar flow velocity as the fit that includes all particles. A similar flow velocity, but a significantly larger kinetic freeze-out temperature is obtained when only protons and light nuclei are included in the fit. The coalescence parameter B4B_4 is well described by calculations from a statistical hadronization model but significantly underestimated by calculations assuming nucleus formation via coalescence of nucleons. Similarly, the (anti)alpha-to-proton ratio is well described by the statistical hadronization model. On the other hand, coalescence calculations including approaches with different implementations of the (anti)alpha substructure tend to underestimate the data

    Light-flavor particle production in high-multiplicity pp collisions at s\sqrt{s} = 13 TeV as a function of transverse spherocity

    No full text
    Results on the transverse spherocity dependence of light-flavor particle production (π\pi, K, p, ϕ\phi, K0{\rm K^{*0}}, KS0{\rm K}^{0}_{\rm{S}}, Λ\Lambda, Ξ\Xi) at midrapidity in high-multiplicity pp collisions at s\sqrt{s} = 13 TeV were obtained with the ALICE apparatus. The transverse spherocity estimator (SOpT=1S_{\text{O}}^{p_{\rm T}=1}) categorizes events by their azimuthal topology. Utilizing narrow selections on SOpT=1S_{\text{O}}^{p_{\rm T}=1}, it is possible to contrast particle production in collisions dominated by many soft initial interactions with that observed in collisions dominated by one or more hard scatterings. Results are reported for two multiplicity estimators covering different pseudorapidity regions. The SOpT=1S_{\text{O}}^{p_{\rm T}=1} estimator is found to effectively constrain the hardness of the events when the midrapidity (η<0.8\left | \eta \right |< 0.8) estimator is used. The production rates of strange particles are found to be slightly higher for soft isotropic topologies, and severely suppressed in hard jet-like topologies. These effects are more pronounced for hadrons with larger mass and strangeness content, and observed when the topological selection is done within a narrow multiplicity interval. This demonstrates that an important aspect of the universal scaling of strangeness enhancement with final-state multiplicity is that high-multiplicity collisions are dominated by soft, isotropic processes. On the contrary, strangeness production in events with jet-like processes is significantly reduced. The results presented in this article are compared with several QCD-inspired Monte Carlo event generators. Models that incorporate a two-component phenomenology, either through mechanisms accounting for string density, or thermal production, are able to describe the observed strangeness enhancement as a function of SOpT=1S_{\text{O}}^{p_{\rm T}=1}.Results on the transverse spherocity dependence of light-flavor particle production (π\pi, K, p, ϕ\phi, K0{\rm K^{*0}}, KS0{\rm K}^{0}_{\rm{S}}, Λ\Lambda, Ξ\Xi) at midrapidity in high-multiplicity pp collisions at s=13\sqrt{s} = 13 TeV were obtained with the ALICE apparatus. The transverse spherocity estimator (SOpT=1S_{{\rm O}}^{{\it p}_{\rm T}=1}) categorizes events by their azimuthal topology. Utilizing narrow selections on SOpT=1S_{\text{O}}^{{\it p}_{\rm T}=1}, it is possible to contrast particle production in collisions dominated by many soft initial interactions with that observed in collisions dominated by one or more hard scatterings. Results are reported for two multiplicity estimators covering different pseudorapidity regions. The SOpT=1S_{{\rm O}}^{{\it p}_{\rm T}=1} estimator is found to effectively constrain the hardness of the events when the midrapidity (η<0.8\left | \eta \right |< 0.8) estimator is used. The production rates of strange particles are found to be slightly higher for soft isotropic topologies, and severely suppressed in hard jet-like topologies. These effects are more pronounced for hadrons with larger mass and strangeness content, and observed when the topological selection is done within a narrow multiplicity interval. This demonstrates that an important aspect of the universal scaling of strangeness enhancement with final-state multiplicity is that high-multiplicity collisions are dominated by soft, isotropic processes. On the contrary, strangeness production in events with jet-like processes is significantly reduced. The results presented in this article are compared with several QCD-inspired Monte Carlo event generators. Models that incorporate a two-component phenomenology, either through mechanisms accounting for string density, or thermal production, are able to describe the observed strangeness enhancement as a function of SOpT=1S_{{\rm O}}^{{\it p}_{\rm T}=1}
    corecore